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^ This paper presents a nonlinear mixing model for hyperspectral image unmixing. The proposed 

-^ model assumes that the pixel reflectances are post-nonlinear functions of unknown pure spectral 

(^\ components contaminated by an additive white Gaussian noise. These nonlinear functions are 

approximated using polynomials leading to a polynomial post-nonlinear mixing model. A Bayesian 
algorithm is proposed to estimate the parameters involved in the model yielding an unsupervised 
nonlinear unmixing algorithm. Due to the large number of parameters to be estimated, an efficient 
Hamiltonian Monte Carlo algorithm is investigated. The classical leapfrog steps of this algorithm are 
modified to handle the parameter constraints. The performance of the unmixing strategy, including 

^-H convergence and parameter tuning, is first evaluated on synthetic data. Simulations conducted 
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^\ with real data finally show the accuracy of the proposed unmixing strategy for the analysis of 

.^ hyperspectral images. 
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I. Introduction 

Identifying macroscopic materials and quantifying the proportions of these materials are 
major issues when analyzing hyperspectral images. This blind source separation problem, also 
referred to as spectral unmixing (SU), has been widely studied for the applications where 
the pixel reflectances are linear combinations of pure component spectra [|T|-[|5|. However, 
as explained in [[6|, [[7|, the linear mixing model (LMM) can be inappropriate for some 
hyperspectral images, such as those containing sand, trees or vegetation areas. Nonlinear 
mixing models (NLMMs) provide an interesting alternative for overcoming the inherent 
limitations of the LMM. They have been proposed in the hyperspectral image literature 
and can be divided into two main classes. 

The first class of NLMMs consists of physical models based on the nature of the envi- 
ronment. These models include the bidirectional reflectance based model proposed in [[8| 
for intimate mixtures associated with sand-like materials and the bilinear models recently 
studied in [|9||-[12| to account for scattering effects mainly observed in vegetation and urban 



areas. The second class of NLMMs contains more flexible models allowing different kinds of 
nonlinearities to be approximated. These flexible models are constructed from neural networks 
p3| , p^ , kernels [15|, [16|, or post-nonlinear transformations [17|, [18|. In particular, a 



polynomial post-nonlinear mixing model (PPNMM) has recently shown interesting properties 
for the SU of hyperspectral images [191. 

Most nonlinear unmixing strategies available in the literature are supervised, i.e., the 
endmembers contained in the image are assumed to be known (chosen from a spectral 
library or extracted from the data by an endmember extraction algorithm (EEA)). Moreover, 



most existing EEAs rely on the LMM [20|-[22| and thus can be inaccurate for nonlinear 



mixtures. Recently, a nonlinear EEA based on the approximation of geodesic distances has 
been proposed in p3| to extract endmembers from the data. However, this algorithm can 
suffer from the absence of pure pixels in the image (as most hnear EEAs). 

This paper presents a fully unsupervised Bayesian unmixing algorithm based on the PP- 



NMM studied in [19|. In the Bayesian framework, appropriate prior distributions are chosen 
for the unknown PPNMM parameters, i.e., the endmembers, the mixing coefficients, the 
nonlinearity parameters and the noise variance. The joint posterior distribution of these 
parameters is then derived. However, the classical Bayesian estimators cannot be easily 
computed from this joint posterior. To alleviate this problem, a Markov chain Monte Carlo 
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(MCMC) method is used to generate samples according to the posterior of interest. More 
precisely, due to the large number of parameters to be estimated we propose to use a 
Hamiltonian Monte Carlo (HMC) [24| method to sample according to some conditional 
distributions associated with the posterior. HMCs are powerful simulation strategies based 
on Hamiltonian dynamics which can improve the convergence and mixing properties of 
classical MCMC methods (such as the Gibbs sampler and the Metropolis-Hastings algorithm) 



1 25 1, [26|. These methods have received growing interest in many applications, especially 



when the number of parameters to be estimated is large [27 1, [28 1. The classical HMC can 



only be used for unconstrained variables. However, new HMC methods have been recently 



proposed to handle constrained variables [ [25| Chap. 5] [ |29| , pO| which allow HMCs to 
sample according to the posterior of the Bayesian model proposed for SU. Finally, as in any 
MCMC method, the generated samples are used to compute Bayesian estimators as well as 
measures of uncertainties such as confidence intervals. 

The paper is organized as follows. Section |Il] introduces the PPNMM for hyperspectral 



image analysis. Section III presents the hierarchical Bayesian model associated with the 



proposed PPNMM and its posterior distribution. The constrained HMC (CHMC) algorithm 



used to sample some parameters of this posterior is described in Section IV The CHMC is 



coupled with a standard Gibbs sampler presented in Section |VJ Some simulation results 



conducted on synthetic and real data are shown and discussed in Sections VI and VII 



Conclusions are finally reported in Section VIII 



II. Problem formulation 
A. Polynomial post-nonlinear mixing model 



This section recalls the nonlinear mixing model used in [19| for hyperspectral image SU. 
We consider a set of N observed spectra y„ = [yn,i, • • • , yn,L]'^, n E {1, . . . , N} where L is 
the number of spectral bands. Each of these spectra is defined as a nonlinear transformation 
g^ of a linear mixture of R spectra vnj. contaminated by additive noise 

yn = 9n\y2i ^^:""^^ ) + e„ = g„ (Ma„) + e„ (1) 

where uir = [m^,!, . . . ,mr^L]'^ is the spectrum of the rth material present in the scene, ar^n 
is its corresponding proportion in the nth pixel, R is the number of endmembers contained 
in the image and gr„ is a nonlinear function associated with the nth pixel. Moreover, e„ 
is an additive independently distributed zero-mean Gaussian noise sequence with diagonal 
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,^iris 



covariance matrix S = diag (cr^), denoted as e„ ~ J\f (0^, S), where cr^ = [a 

the vector of the L noise variances and diag (cr^) is an L x L diagonal matrix containing the 

elements of the vector cr^. Note that the usual matrix and vector notations M = [mi, . . . , uin] 



and a„ = [ai„, . . . ,a.ij,n]^ have been used in the right hand side of ([T]). As in |19|, the A^ 
nonlinear functions g^ are defined as second order polynomial nonlinearities defined by 



9r 



[0,1]-^ ->M^ 



S H^ [Si + bnSl, ...,Sl + bnSl] 



2lT 



(2) 



with s = [si, . . . , sl]'^ and 6„ is a real parameter. An interesting property of the resulting 
nonlinear model referred to as polynomial post nonlinear mixing model (PPNMM) is that it re- 
duces to the classical LMM for 6„ = 0. Motivations for considering polynomial nonlinearities 



have been discussed in [ 19 1. In particular, it has been shown that the PPNMM is very flexible 
to approximate many different nonlinearities and can be used for nonlinearity detection. 
Straightforward computations allow the PPNMM observation matrix to be expressed as 
follows 

Y = MA + [(MA) (MA)] diag (b) + E (3) 

where A = [ai, . . . , a^-] is an R x N matrix, Y = [yi, . . . , yiy] and E = [ei, . . . , ejv] are 
L X N matrices, b = [6i, . . . , Bn]'^ is an A^ x 1 vector containing the nonlinearity parameters 
and denotes the Hadamard (termwise) product. 



B. Abundance reparametrization 

Due to physical considerations, the abundance vectors a„ satisfy the following positivity 



and sum-to-one constraints 



R 



^ar,„ = l, ar.,n > 0,Vr e {1,...,-R} 



(4) 



r=l 



To handle these constraints, we propose to reparameterize the abundance vectors belonging 
to the following set 



S = la= [ai,...,aRY 
using the following transformation 



R 



> 0,y^ar = 1 



(5) 



r=l 




(6) 
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This transformation has been recently suggested in [31]. One motivation for using the latent 
variables Zr,n instead of ar,n is the fact that the constraints (|4]) for the rath abundance vector 
a„ express as 

0<2,,„<1, VrG{l,...,/2-l} (7) 

for the nth coefficient vector 2„ = [2:1 „, . . . ,2;r-i,„]^. As a consequence, the constraints (|7]) 
are much easier to handle for the sampling procedure than (|4]) (as will be shown in Sections 
IV and |V]). The next section presents the Bayesian model associated with the PPNMM ([T]) 



for SU. 



in. Bayesian model 



This section generalizes the hierarchical Bayesian model introduced in [19| in order to 
jointly estimate the abundances and endmembers, leading to a fully unsupervised hyper- 
spectral unmixing algorithm. The unknown parameter vector associated with the PPNMM 
contains the reparameterized abundances Z = [zi, . . . , zn] (satisfying the constraints Q), the 
endmember matrix M, the nonlinearity parameter vector b and the additive noise variance cr^. 
This section summarizes the likelihood and the parameter priors (associated with the proposed 
hierarchical Bayesian PPNMM) introduced to perform nonlinear unsupervised hyperspectral 
unmixing. 

A. Likelihood 

Equation ([3]) shows that yn|M, z^, 6„, a^ is distributed according to a Gaussian distribution 
with mean gi,„ (Ma„) and covariance matrix S, denoted as y„|M, z„, 6„, cr^ ~ A^ [g,^^ (Ma„) , S) 
Note that the abundance vector a„ should be denoted as an{Zn). However, the argument z^ 
has been omitted for brevity. Assuming independence between the observed pixels, the joint 
likelihood of the observation matrix Y can be expressed as 

' (Y - xVi:-^(Y - XV 



/(Y|M,Z,b,o-2)oc |S|-^/2g^r 



2 
where oc means "proportional to", etr(-) denotes the exponential trace and X = MA 

[(MA) (MA)] diag (b) is an L x A^ matrix. 



(8) 
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B. Parameter priors 

1) Coefficient matrix Z; To reflect the lack of prior knowledge about the abundances, 
we propose to assign prior distributions for the coefficient vector 2„ that correspond to 
noninformative prior distributions for a„. More precisely, assigning the following beta priors 

Zn,r-^Be{R-r,l) r e {1, . . . , i? - 1} (9) 

and assuming prior independence between the elements of 2„ yield an abundance vector 

a„ uniformly distributed in the set defined in ([5]) (see [31 1 for details). Assuming prior 

independence between the coefficient vectors {zn}^^i ^ leads to 

R-i r N >| 

where B{-,-) is the Beta function. 

2) Endmembers: Each endmember m^ = [m^^i, . . . , m,,/,]^ is a reflectance vector satisfy- 
ing the following constraints 

0<m,,f < l,Vr G {1,...,/?},V£g {1,...,L}. (11) 

For each endmember nij., we propose to use a Gaussian prior 

nir ~ A/'[o,i]i(mr., s^Il), (12) 

truncated on [0, 1]^ to satisfy the constraints ( [TT] ). In this paper, we propose to select the 
mean vectors rrir as the pure components previously identified by the nonlinear EEA studied 
in [23 1 and referred to as "Heylen". The variance s^ reflects the degree of confidence given 



to this prior information. When no additional knowledge is available, this variance is fixed 
to a large value (s^ = 50 in our simulations). Note that any EEA could be used to define the 
vectors mi, . . . , rriR. 

3) Nonlinearity parameters: The PPNMM reduces to the LMM for 6„ = 0. Since the 
LMM is relevant for most observed pixels, it makes sense to assign prior distributions to the 
nonlinearity parameters that enforce sparsity for the vector b. To detect linear and nonlinear 
mixtures of the pure spectral signatures in the image, the following conjugate Bernoulli- 
Gaussian prior is assigned to the nonlinearity parameter 6„ 

1 f hi 



f{bn\w, al) = (1 - w)5{bn) + w^== exp ( --\ ) (13) 



where (5(-) denotes the Dirac delta function. Note that the prior distributions for the non- 
linearity parameters {6„}^^j^ ^ share the same hyperparameters w G [0, 1] and ol G E"*". 
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More precisely, the weight w is the prior probability of having a nonlinearly mixed pixel in 
the image. Assuming prior independence between the nonlinearity parameters {bn}n=i n ' 
the joint prior distribution of the nonlinearity parameter vector b can be expressed as follows 

N 

f{b\w,al) = l[fib^\w,al) (14) 

n=l 

4) Noise variances: A Jeffreys' prior is chosen for the noise variance of each spectral 
band aj 

f{al) oc ^1r+ {al) (15) 



which reflects the absence of knowledge for this parameter (see |32| for motivations). As- 
suming prior independence between the noise variances, we obtain 



f\^') = \{f{^J)- (16) 



1=1 



C. Hyperparameter priors 

The performance of the proposed Bayesian model for spectral unmixing depends on the 
values of the hyperparameters ol and w. When the hyperparameters are difficult to adjust, it is 
classical to include them in the unknown parameter vector, resulting in a hierarchical Bayesian 



model [19 1, [33|. This strategy requires to define prior distributions for the hyperparameters. 



^2 



A conjugate inverse-Gamma prior is assigned to a^ 

al^Xg{^,u) (17) 

where (7, v) are real parameters fixed to obtain a flat prior, reflecting the absence of knowl- 
edge about the variance al ((7, u) will be set to (10^^, 10^^) in the simulation section). A 
uniform prior distribution is assigned to the hyperparameter w 

w~W[o,i](w^) (18) 

since there is no a priori information regarding the proportions of linearly and nonlinearly 
mixed pixels in the image. The resulting directed acyclic graph (DAG) associated with the 
proposed Bayesian model is depicted in Fig. [T] 
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D. Joint posterior distribution 

The joint posterior distribution of the unknown parameter/hyperparameter vector {^,$} 
where 6 = {Z, M, b, cr^} and # = {af, w} can be computed using the following hierarchical 
structure 

/(0,$|Y)(x/(Y|0,$)/(0,$) (19) 

where f(Y\0) has been defined in ([8]). By assuming a priori independence between the 
parameters Z, M, b and cr^ and between the hyperparameters ab and w, the joint prior 
distribution of the unknown parameter vector can be expressed as 

= f{Z)f{M)f{a')f{b\alw)f{al)f{w). (20) 

The joint posterior distribution /(0,$|Y) can then be computed up to a multiplicative 
constant after replacing ([20]) and ([8]) in ([19]). Unfortunately, it is difficult to obtain closed 



form expressions for the standard Bayesian estimators (including the maximum a posteriori 



(MAP) and the minimum mean square error (MMSE) estimators) associated with (19). In 



this paper, we propose to use efficient Markov Chain Monte Carlo (MCMC) methods to 



generate samples asymptotically distributed according to ( [T9| ). Due to the large number of 
parameters to be sampled, we use an HMC algorithm which allows the number of sampling 
steps to be reduced and which improves the mixing properties of the sampler. The generated 
samples are then used to compute the MMSE estimator of the unknown parameter vector 
(0, $). The next section summarizes the basic principles of the HMC methods that will be 



used to sample asymptotically from (19). 



IV. Constrained Hamiltonian Monte Carlo method 

HMCs are powerful methods for sampling from many continuous distributions by intro- 
ducing fictitious momentum variables. Let q E M^ be the parameter of interest and n(q) its 
corresponding distribution to be sampled from. From statistical mechanics, the distribution 
7r(q) can be related to a potential energy function U{q) = — log [n{q)\+c where c is a positive 
constant such that Jexp (—U{q) + c)dq = 1. The Hamiltonian of 7r(g) is a function of the 
energy U{q) and of an additional momentum vector p E M^ defined as 

H{q,p) = U{q) + K{p) (21) 
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Fig. 1. DAG for the parameter and hyperparameter priors (thie fixed parameters appear in boxes). 



where K{p) is an arbitrary kinetic energy function. Usually, a quadratic kinetic energy is 
chosen and we propose to use K(p) = p^p/2 in this paper (for reasons explained later). 



The Hamiltonian (21) defines the following distribution 



f{q,p) oc exp[-H{q,p)] 

1 



oc TT[q) exp 



p^p 



(22) 



for {q,p) which shows that q and p are independent and that the marginal distribution of p is 
a A/'(Od,Id) distribution. The HMC algorithm allows samples to be asymptotically generated 
according to ([22]). The ith HMC iteration starts with an initial pair of vectors (g'^*\p*^*)) and 
consists of two steps. The first step resamples the initial momentum p^*^ according to the 
standard multivariate Gaussian distribution. The second step uses Hamiltonian dynamics to 
propose a candidate {q*,p*) which is accepted with the following probability 



p = min {exp [-H{q*,p*) + iJ(qr«,p«)] , l} . 



(23) 



A. Generation of the candidate {q*,p*) 

Hamiltonian dynamics are usually simulated by discretization methods such as Euler or 
leapfrog methods. The classical leapfrog method is a discretization scheme composed of A^lf 
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steps with a discretization stepsize e. The nth leapfrog step can be expressed as 

e dU 

(i,(n+l)e) _ (i,ne) _|_ gp(i,ne+e/2) (24b) 



p(.n.+e/2) ^ pi^,ne) _ ^_^ (^(.ne)) (24a) 



(i,(n+l)e) ^ (i,ne+e/2) _ ^ _^ r (i,(„+i),)i ^ (24c) 

The leapfrog method starts with (g^^'^^p*-*^) = (g*-*\p^*'') and the candidate is set after A^lf 

steps to {q*,p*) = (g(i'^^LF)^p(i,^A'LF))_ 

However, if q is subject to constraints, more sophisticated discretization methods must be 
used. Assume that the vector of interest q = [qi, . . . , qn]^ satisfies the following constraints 

qi<qd<qu, de{l,...,D} (25) 

where qi (resp. q^) is the lower (resp. upper) bound for q^ (such kind of constraints need 
to be satisfied by the elements of Z and the endmembers in M). In this paper we propose 
to use the constrained leapfrog scheme studied in [^, Chap. 5], consisting of A^lf steps, 
with a discretization stepsize e^. Each CHMC iteration starts in a similar way to the classical 



leapfrog method, with the sequential sampling of the momentum p ( |24a[ ) and the vector 
q ( |24b| ). However, if the generated vector q violates the constraints ( [25] ), it is modified 
depending on the violated constraints and the momentum is negated (see [|25] Chap. 5] for 
more details). This step is repeated until each component of the generated q satisfies the 



contraints. The CHMC ends with the update of the momentum p ( |24c[ ). One iteration of 



the resulting constrained HMC algorithm (CHMC) is summarized in Algo. [TJ As mentioned 
above, one might think of using a more sophisticated kinetic energy for p to improve the 
performance of the HMC algorithm. However, the kinetic energy K(p) = p^p/2 allows the 
discretization method handling the constraints to be simple and will provide good performance 
for our application (as will be shown in Section |VI]). The performance of the HMC mainly 
relies on the values of the parameters A^lf and tq. Fortunately, the choice of tq is almost 
independent of A^lf such that these two parameters can be tuned sequentially. The procedures 
used in this paper to adjust A'lf and e^ are detailed in the next paragraphs. 

B. Tuning the stepsize e^ 

The step size e^ is related to the accuracy of the leapfrog method to approximate the 
Hamiltonian dynamics. When e^ is "small", the approximation of the Hamiltonian dynamic 
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is accurate and the acceptance rate ([23]) is high. However, the exploration of the distribution 
support is slow (for a given A^lf)- In this paper, we propose to tune the stepsize during the 
burn-in period of the sampler. More precisely, the stepsize is decreased (resp. increased) by 
25% if the average acceptance rate over the last 50 iterations is smaller than 0.5 (resp. higher 
than 0.8). Note that the stepsize update only happens during the burn-in period to ensure the 
Markov chain is homogeneous after the burn-in period. 

C. Tuning the number of leapfrog steps Nip 

Assume e^ has been correctly adjusted. Too small values of A^lf lead to a slow exploration 
of the distribution (random walk behavior) whereas too high values of A^lf require high 
computational time. Similarly to the stepsize e^, the optimal choice of A'lf depends on 
the distribution to be sampled. The sampling procedure proposed in this paper consists of 
several HMC updates included in a Gibbs sampler (as will be shown in the next section). 
The number of leapfrog steps required for each of these CHMC updates has been adjusted by 
cross-validation. From preliminary runs, we have observed that setting the number of leapfrog 
steps for each HMC update close to A^lf = 50 provides a reasonable tradeoff ensuring a good 
exploration of the target distribution and a reasonable computational complexity. To avoid 
possible periodic trajectories, it is recommended to let A'lf random [ |25| Chap. 5]. In this 
paper, we have assumed that A^lf is uniformly drawn in the interval [45, 55] at each iteration 
of the Gibbs sampler. The next section presents the Gibbs sampler (including CHMC steps) 



which is proposed to sample according to (19) 
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Algorithm 1 
Constrained Hamiltonian Monte Carlo iteration 



1: %Initialization of the ith iteration(n = 0) 
• q(«.o) — qr(*) satisfying the constraints 
. Sample p(''0) ^ p(')) ^ AA (Od,Id) 

2: %Modified leapfrog steps 

3: for 71 = : iVLF - 1 do 

4: %Standard leapfrog steps 

5: . Compute p(«."^+<^/2) = pC^"^) _ £ (q(^"0) 

Compute q(^("+i)^) = q(^"«) + ^p(t.ne+e/2) 

6: %Steps required to ensure q(*'("+i)'^) satisfies (25 1 



7: while qr(*^("+i)«) does not satisfy (|25]l do 
8: for d = 1 : D do 

9: if g(*'("+i)^) < g; then 

10: Set g(^^("+l)^) = 2qi - qf^''+^^'^ 

(replace q]^ by its symmetric with respect to qi) 

12: end if 

13: if g^*+'^ > qu then 

14: Set g(''("+i)') = 2g„ - ^(^■("+i)^) 

(replace q)l^ by its symmetric with respect to qu) 

15: Set p^^'"^+^/2) = _pC«,»^+^/2) 

16: end if 

17: end for 

18: end while 

19: %Standard leapfrog step 

20: Compute ^(^"+1)^) ^ p(^,ne+e/2) - £ 1^ [q(^("+l)^)] 

21: end for 

22: %Accept-reject procedure 

23: Set p* = ^(^^^lf) ^nd q* = qi^-^^^^^) 

24: Compute p using ( |23| ) 

25: Set (qr(*+i),p('+i)) = [q* ,p*) with probability p 

26: Else set (q('+i),p(^+i)) = (q^^p^'^). 
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V. GiBBS Sampler 

The principle of the Gibbs sampler is to sample according to the conditional distributions 
of the posterior of interest [26, Chap. 10]. Due to the large number of parameters to be 
estimated, it makes sense to use a block Gibbs sampler to improve the convergence of the 
sampling procedure. More precisely, we propose to sample sequentially M, Z, b, cr^, af and 
w using six moves that are detailed in the next sections. 

A. Sampling the coefficient matrix Z 

Sampling from /(Z|Y, M, b, cr^, cr^, w) is difficult due to the complexity of this distribu- 
tion. In this case, it is classical to use an accept/reject procedure to update the coefficient 
matrix Z (leading to a hybrid Metropolis-Within-Gibbs sampler). Since the elements of Z 
satisfy the constraints Q, the CHMC studied in Section IV could be used to sample according 



to the conditional distribution /(Z|Y, M, b, cr^, cr^,, w). However, as for Metropolis-Hastings 
updates, the convergence of HMCs generally slows down when the dimensionality of the 
vector to be sampled increases. Consequently, sampling an A^(i?— l)-dimensional vector using 
the proposed CHMC can be inefficient when the number of pixels is very large. However, it 
can be shown that 

TV 

/(Z|Y, M, b, ct\ a,, w) = l[ /(2„|y„, M, 6„, a^), (26) 

n=l 

i.e., the N coefficients vectors {zn}n=i n ^^ ^ posteriori independent and can be sampled 
independently in a parallel manner. Straightforward computations lead to 

(yn ~ Xn) S (y„ — X^ 



/(z„|yn,M,6„,cr ) oc exp 



R-l 



X l(o,i)«-i (z„) J] ^J/-i (27) 

r 

where x„ = gr^ (Ma„), l(o,i)fl-i (■) denotes the indicator function over (0, 1)'^^^. The distri- 



bution pTj ) is related to the following potential energy 

(yn-a3„)^S"^(y„-a3„) 



U{zn) 



2 
i?-i 

- J2^og{z^-^-') (28) 

r=l 

where we note that /(z„|y„, M, 6„, cr^) oc exp [— 1/(2„)]. N momentum vectors associated 



with a canonical kinetic energy are introduced. The CHMC of Section IV is then applied 
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independently to the A^ vectors 2„ whose dimension (R — 1) is relatively small. The partial 
derivatives of the potential function ( |28| ) required in Algo. [T] are derived in the Appendix. 

B. Sampling the endmember matrix M 






From ( |T9| ) and ( |20| ), it can be seen that 

L 

/(M| Y, Z, b, 0-2, s\ M) = J] /(m,, |y,,, Z, b, a^, s^, m,,) 

£=1 

where m^,; (resp. m^,: and y^,;) is the £th row of M (resp. of M and Y) and 

/ Ikr ■/■ l|2 

/(m£_:|y^,;,Z,6,(7^,s , m^ J oc exp I — ^ 

|2" 
iii£,: I 

with ti = A'^nii^: + diag (6) [(A^m^:) (A-^m^:)]. Consequently, the rows of the end- 
member matrix M can be sampled independently similarly to the procedure described in 
the previous section (to sample Z). More precisely, we introduce a potential energy l^(m£ .) 
associated with m^ ^ defined by 

Vir.,,) = Mi_^ + ^_^ (30) 

and a momentum vector associated with a canonical kinetic energy. The partial derivatives 
of the potential function ([30]) required in Algo. [T] are derived in the Appendix. 



C. Sampling the nonlinearity parameter vector b 

Using ([19]) and ( [20] ), it can be easily shown that the conditional distribution of hj^n, Mz^, cr^, w, al 
is the following Bernoulli-Gaussian distribution 

6„|y„, M, z„, 0-2, w, al ~ (1 - 0(5(6„) + wlM (/i„, si) (31) 

where 

al (y„ - Manf T.~^hn ^ _ ^b 



^" aXS-^^n + 1 ' '^ aXS-^^n + 1 

and h.„ = (Ma„) (Ma„). Moreover, 

w 



/3„ + w(l-/3„) 

— exp - — 

s„, V 2s 



/3n = -exp(-£|). (32) 



For each 6„, the conditional distribution ( [31] ) does not depend on {hk}).^^^. Consequently, the 



nonlinearity parameters {6„}^^^ ^ can be sampled independently in a parallel manner. 
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D. Sampling the noise variance vector cr^ 



By considering the posterior distribution ( |T9| ), it can be shown that 



L 



/(o-2|Y, M, Z, b) = J] /(a,%,„ m,,, Z, h) (33) 



£=1 
,2 



and that cr^ly^,:, ni:/, Z, b is distributed according to the following inverse-gamma distribution 



ai\y,,, m,,, Z,b^Xgl-, ^^ '-^-^^ ^ 1 (34) 

where X = [xi^.,, . . . ,xl,-]'^. Thus the noise variances can be sampled easily and indepen- 
dently. 



Algorithm 2 
Gibbs sampler 

1: Initialization ^ = 

. Z(o),M(o),6(°),<T2(o),u;(o),a,'(°). 
Iterations 
for t = 1 : TVmc do 

Parameter update 

Sample Z^*-* from the pdfs (J27]l using a CHMC procedure. 

Sample M^*^ from the pdfs ( |29] l using a CHMC procedure. 

Sample 6*^*' from the pdfs ([31). 

Sample cr2(*) from the pdfs ([34|. 

Hyperparameter update 

Sample a^^^ from the pdf ( |35] l. 

Sample lu'^*^ from the pdf ([36]). 
end for 



£. Sampling the hyperparameters a^ and w 

Looking carefully at the posterior distribution ( [T9| ), it can be seen that a^|b, 7, z/ is dis- 



tributed according to the following inverse-gamma distribution 



al\b,^,u^ig{'^+^,J2^j + u] (35) 

ne/i 
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with Ji = {n\bn 7^ 0}, Uq = ||b||g (where ||-||p is the io norm, i.e., the number of elements of 
b that are different from zero) and ni = N — no, from which it is easy to sample. Similarly, 
we obtain 

w\br^ Be{ni + 1, no + 1). (36) 

Finally, the Gibbs sampler (including HMC procedures) used to sample according to the 
posterior ([19]) consists of the six steps summarized in Algo.|2] The small number of sampling 



steps is due to the high parallelization properties of the proposed sampling procedure, i.e., 
the generation of the A^ coefficient vectors {Zn}^^^ ^, the N nonlinearity parameters 
{hn}n=i N ^'^^ ^^^ ^ reflectance vectors {m^,:}^^i j^- After generating A^mc samples using 
the procedures detailed above, the MMSE estimator of the unknown parameters can be 
approximated by computing the empirical averages of these samples, after an appropriate 
burn-in periocQ The next section studies the performance of the proposed algorithm for 
synthetic hyperspectral images. 

VL Simulations on synthetic data 

A. Simulation scenario 

The performance of the proposed nonlinear SU algorithm is first evaluated by unmixing 3 
synthetic images of size 50 x 50 pixels. The R = 3 endmembers observed at L = 207 different 
spectral bands and contained in these images have been extracted from the spectral libraries 



provided with the ENVI software [35 1 (i.e., green grass, olive green paint and galvanized 
steel metal). The first synthetic image h has been generated using the standard linear mixing 
model (LMM). A second image I2 has been generated according to the PPNMM and a third 
image L^ has been generated according to the generalized bilinear mixing model (GBM) 
presented in p2| . For each image, the abundance vectors an,n = 1,...,2500 have been 
randomly generated according to a uniform distribution in the admissible set defined by 

St= la 0<ar< 0.9, ^ a^ = 1 I . (37) 

Note that the conditions a, < 0.9 ensure that there is no pure pixel in the images, which 
makes the unmixing problem more challenging. All images have been corrupted by an 
additive independent and identically distributed (i.i.d) Gaussian noise of variance cr^ = 10^"^, 



The length of the bum-in period has been determined using appropriate convergence diagnoses 



34 
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corresponding to an average signal-to-noise ratio SNR ~ 21dB for the three images. The 
noise is assumed to be i.i.d. to fairly compare unmixing performance with SU algorithms 
assuming i.i.d. Gaussian noise. The nonlinearity coefficients are uniformly drawn in the set 
[0, 1] for the GBM. The parameters bn,n = 1, . . . ,N have been generated uniformly in the 
set [-0.3, 0.3] for the PPNMM. 

B. Comparison with other SU procedures 

Different estimation procedures have been considered for the three mixing models. More 
precisely, 

• Two unmixing algorithms have been considered for the LMM. The first strategy extracts 



the endmembers from the whole image using the N-FINDR algorithm |20| and estimates 
the abundances using the FCLS algorithm [[2| (it is referred to as "SLMM" for supervised 
LMM). The second strategy is a Bayesian algorithm which jointly estimates the end- 



members and the abundance matrix p3| (it is referred to as "ULMM" for unsupervised 

LMM). 

Two approaches have also been considered for the PPNMM. The first strategy uses the 

nonhnear EEA studied in [ |23| and the gradient-based approach based on the PPNMM 



studied in [ 19 1 for estimating the abundances and the nonlinearity parameter. This 
strategy is referred to as "SPPNMM" (supervised PPNMM). The second strategy is 
the proposed unmixing procedure referred to as "UPPNMM" (unsupervised PPNMM). 



The unmixing strategy used for the GBM is the nonlinear EEA studied in [23 1 and the 



gradient-based algorithm presented in [36| for abundance estimation. 



The quality of the unmixing procedures can be measured by comparing the estimated and 
actual abundance vector using the root normalized mean square error (RNMSE) defined by 



RNMSE 



\ 



1 ^ 
> a„ — a„ (38) 

ra=l 

where a„ and a„ are the actual and estimated abundance vectors for the nth pixel of the 
image and A^ is the number of image pixels. Table |l] shows the RNMSEs associated with the 
images Ii, . . . ,1^ for the different estimation procedures. These results show that the proposed 
UPPNMM performs better (in term of RNMSE) than the other considered unmixing methods 
for the three images. Moreover, the proposed method provides similar results when compared 
with the ULMM for the linearly mixed image h. 
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TABLE I 
Abundance RNMSEs (xlO^^): synthetic images. 







h 

(LMM) 


(PPNMM) 


(GBM) 


LMM 


SLMM 


3.78 


13.21 


6.83 


ULMM 


0.66 


10.87 


4.21 


PPNMM 


SPPNMM 


4.18 


6.04 


4.13 


UPPNMM 


0.37 


0.81 


1.38 


GBM 


4.18 


11.15 


5.02 



Fig.|2]compares the endmember simplexes estimated by Heylen's method [|23| (black) (used 
to build the endmember prior) and by the proposed method (red) to the actual endmembers 
(green stars). For visualization, the observed pixels and the actual and estimated endmembers 
have been projected onto the three first axes provided by the principal component analysis. 
These figures show that the proposed unmixing procedure provides accurate estimated end- 
members for the three images Ji to I^. Due to the absence of pure pixels in the image, the 
manifold generated by the observed pixels Y is difficult to estimate. This explains the limited 
performance obtained with Heylen's method. Conversely, the use of the prior ([12]) allows the 
endmembers 111^ to depart from the prior estimations 111^ leading to improved performance. 

The quality of endmember estimation is also evaluated by the spectral angle mapper (SAM) 
defined as 

(m,,,m,) 



SAM 



arccos 



(39) 



|m,f. II llrrir 

where 01^, is the rth actual endmember and 111^ its estimate. The smaller |SAM|, the closer 
the estimated endmembers to their actual values. Table [ll] compares the performance of the 
different endmember estimation algorithms. This table shows that the proposed UPPNMM 
generally provides more accurate endmember estimates than the others methods. Moreover, 
these results illustrate the robustness of the PPNMM regarding model mis-specification. Note 
that the ULMM and the UPPNMM provide similar results (in term of SAMs) for the image 
Ji generated according to the LMM. 

Finally, the unmixing quality can be evaluated by the reconstruction error (RE) defined as 



RE 



\ 



1 ^ 
__ \^ II " 

NL ^ "^" 



Ynl 



(40) 



ra=l 
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(b)/2 



(c)/3 



Fig. 2. Visualization of tlie A'^ = 2500 pixels (blue dots) of /i, I2 and /a using the first principal components provided 
by the standard PCA. The green stars correspond to the actual endmembers and the triangles are the simplexes defined by 
the endmembers estimated by the Heylen's method (black) and the proposed method (red). 



where y„ is the nth observation vector and y„ its estimate. Table III compares the REs 
obtained for the different synthetic images. These results show that the REs are close for the 
different unmixing algorithms even if the estimated abundances can vary more significantly 
(see Table |I]). Again, the proposed PPNMM seems to be more robust than the other mixing 
models to deviations from the actual model in term of RE. 



C. Analysis of the estimated nonlinearity parameters 

As mentioned above, one of the major properties of the PPNMM is its ability to characterize 
the linearity /nonlinearity of the underlying mixing model for each pixel of the image via the 
nonlinearity parameter 6„. Fig. [3] shows the nonlinearity parameter distribution estimated for 
the three images h to h using the UPPNMM. This figure shows that the UPPNMM clearly 
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TABLE 11 
SAMS (xlO^^): SYNTHETIC IMAGES. 







N-Findr 


ULMM 


Heylen 


UPPNMM 


h 


mi 


5.68 


0.95 


6.42 


0.27 


m2 


5.85 


0.32 


7.46 


0.36 


ms 


3.31 


0.30 


5.26 


0.27 


h 


mi 


9.27 


9.68 


6.71 


0.59 


m2 


8.58 


8.67 


11.80 


0.38 


ms 


4.47 


6.34 


4.98 


0.26 


h 


mi 


7.35 


3.42 


6.48 


1.50 


m2 


10.68 


3.13 


11.88 


3.22 


ms 


4.34 


7.44 


3.20 


0.85 



TABLE III 

RES (XIO^^): SYNTHETIC IMAGES. 







h 

(LMM) 


h 
(PPNMM) 


h 
(GBM) 


LMM 


SLMM 


1.04 


1.74 


15.16 


ULMM 


0.99 


1.43 


1.07 


PPNMM 


SPPNMM 


1.26 


1.27 


1.31 


UPPNMM 


0.99 


0.99 


0.99 


GBM 


1.27 


1.64 


1.33 



identifies the linear mixtures of the image Ji whereas more nonlinearly mixed pixels can be 
identified in the images I2 and I^. The analysis of Fig. [3] also shows that the nonlinearities 
contained in the image L^ (GBM) are generally less significant than the nonhnearities affecting 
I2 (PPNMM) for a same signal-to-noise ratio (SNR ~ 21dB). 

D. Performance for different numbers of endmembers 

The next set of simulations analyzes the performance of the proposed UPPNMM algorithm 
for different numbers of endmembers {R G {4, 5, 6}) by unmixing three synthetic images 
of A^ = 2500 pixels distributed according to the PPNMM. The endmembers contained 
in these images have been extracted from the spectral libraries provided with the ENVI 
software [35]. For each image, the abundance vectors a„, n = 1, . . . , A^ have been randomly 
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Fig. 3. Distributions of the nonlinearity parameters b„ for tlie images /i (left), I2 (middle) and /a (right). 



generated according to a uniform distribution over the admissible set pT) . All images have 
been corrupted by an additive white Gaussian noise corresponding to a^ = 10"'^. The 



nonlinearity coefficients 6„ are uniformly drawn in the set [—0.3,0.3]. Tables IV compares 



the performance of the proposed method in term of endmember estimation (average SAMs 
of the R endmembers), abundance estimation and reconstruction error. These results show a 
general degradation of the abundance and endmember estimations when R is increasing (this 
is intuitive since estimator variances usually increase with the number of parameters to be 
estimated). However, this degradation is reasonable when compared to Heylen's method. 
The proposed algorithm still provides accurate estimates, as illustrated in Fig. |4] which 
compares the actual and estimated endmembers associated with the image containing R = 6 
endmembers. 

TABLE IV 

Unmixing performanceisynthetic images. 







i? = 4 


R^5 


R = 6 


Average SAMs (xlO"^) 


SPPNMM 


7.76 


10.78 


18.53 


UPPNMM 


0.47 


0.81 


1.09 


RNMSEs (xlO-2) 


SPPNMM 


7.58 


10.95 


16.52 


UPPNMM 


0.78 


1.23 


1.47 


REs (xlO'2) 


SPPNMM 


1.36 


1.46 


1.64 


UPPNMM 


0.99 


0.99 


0.99 
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Fig. 4. Actual endmembers (blue dots) and the endmembers estimated by Heylen's method (black lines) and the UPPNMM 
(red lines) for the synthetic image containing R — 6 endmembers. 



VII. Simulations on real data 



A. Data sets 



The real image considered in this section was acquired in 2010 by the Hyspex hyperspectral 
scanner over Villelongue, France (00 03'W and 4257'N). L = 160 spectral bands were 
recorded from the visible to near infrared with a spatial resolution of 0.5m. This dataset 



has already been studied in [16|, |37| and is mainly composed of forested and urban areas. 



More details about the data acquisition and pre-processing steps are available in [37 1. Two 
sub-images denoted as scene #1 and scene #2 (of size 31 x 30 and 50 x 50 pixels) are 
chosen here to evaluate the proposed unmixing procedure and are depicted in Fig. [5] (bottom 
images). The scene #1 is mainly composed of road, ditch and grass pixels. The scene ^2 
is more complex since it includes shadowed pixels. For this image, shadow is considered as 
an additional endmember, resulting in i? = 4 endmembers, i.e., tree, grass, soil and shadow. 
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B. Endmember and abundance estimation 



The endmember s extracted by N-FINDR, the ULMM algorithm [33 1 and Heylen's method 



[23 1 with i? = 3 (resp. R = A) for the scene #1 (resp. scene #2) are compared with 
the endmembers estimated by the UPPNMM in Fig. [6] (resp. Fig. |7]). For the scene #1, 
the four algorithms provide similar endmember estimates whereas the estimated shadow 
spectra are different for the scene #2. The N-FINDR algorithm and Heylen's method estimate 
endmembers as the purest pixels of the observed image, which can be problematic when 
there is no pure pixel in the image (as it occurs with shadowed pixels in the scene #2). 
Conversely, the ULMM and UPPNMM methods, which jointly estimate the endmembers 
and the abundances seem to provide more relevant shadow spectra (of lower amplitude). 
Examples of abundance maps for the scene ^^1 (resp. scene #2), estimated by the ULMM 
and the UPPNMM algorithms are presented in Fig. [8] (resp. Fig. |9]). The abundance maps 
obtained by the UPPNMM are similar to the abundance maps obtained with ULMM. 

C. Analysis of nonlinearities 



Fig. 10 shows the estimated maps of 6„ for the two considered images. Different nonlinear 
regions can be identified in the scene #1, mainly in the grass-planted region (probably due 
to endmember variability) and near the ditch (presence of relief). For the scene #2, nonlinear 
effects are mainly detected in shadowed pixels. 

D. Estimation of noise variances 



Fig. 1 1 compares the noise variance estimated by the UPPNMM for the two real images 



with the noise variance estimated by the HySime algorithm [38 1. The HySime algorithm 



assumes additive noise and estimates the noise covariance matrix of the image using multiple 



regression. Fig. 1 1 first shows that the two algorithms provides similar noise variance esti- 
mates. Moreover, these results motivate the consideration of non i.i.d. noise for hyperspectral 
image analysis since the noise variances increase for the higher wavelengths for the two 
images. 

E. Image reconstruction 

The proposed algorithm is finally evaluated from the REs associated with the two real 
images. These REs are compared in Table [V] with those obtained by assuming other mixing 
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Fig. 5. Top: real hyperspectral Madonna data acquired by the Hyspex hyperspectral scanner over Villelongue, France. 
Bottom: Scene ^1 (left) and Scene #2 (right) shown in true colors. 
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Fig. 6. The i? = 3 endmembers estimated by N-Findr (blue lines), ULMM (green lines), Heylen's method (black lines) 
and the UPPNMM (red lines) for the scene #1. 
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Fig. 7. The R = 4 endmembers estimated by N-Findr (blue lines), ULMM (green lines), Heylen's method (black lines) 
and the UPPNMM (red lines) for the scene #2. 



models. The two unsupervised algorithms (ULMM and UPPNMM) provide smaller REs than 
the SU procedures decomposed into two steps. This observation motivates the use of joint 
abundance and endmember estimation algorithms. 



TABLE V 

REs (xlO~^): Real image. 



Scene #1 



Scene #2 



SLMM 



LMM 



ULMM 



1.53 



1.11 



1.04 



0.88 



SPPNMM 



PPNMM 



UPPNMM 
GBM 



1.50 



1.08 

1.72 



1.17 



0.89 
1.25 
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Fig. 8. Abundance maps estimated by the SLMM, the GBM and the UPPNMM algorithms for the scene #1. 

VIII. Conclusions and future work 

We proposed a new hierarchical Bayesian algorithm for unsupervised nonlinear spectral 
unmixing of hyperspectral images. This algorithm assumed that each pixel of the image 
is a post-nonlinear mixture of the endmembers contaminated by additive Gaussian noise. 
The physical constraints for the abundances and endmembers were included in the Bayesian 
framework through appropriate prior distributions. Due to the complexity of the resulting 
joint posterior distribution, a Markov chain Monte Carlo method was used to approximate 
the MMSE estimator of the unknown model parameters. Because of the large number of 
parameters to be estimated, Hamiltonian Monte Carlo methods were used to reduce the 
sampling procedure complexity and to improve the mixing properties of the proposed sampler. 
Simulations conducted on synthetic data illustrated the performance of the proposed algorithm 
for linear and nonlinear spectral unmixing. An important advantage of the proposed algorithm 
is its flexibility regarding the absence of pure pixels in the image. Another interesting property 
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Fig. 9. Abundance maps estimated by the SLMM, the GBM and the UPPNMM algorithms for the scene #2. 





(a) Scene #1 



(b) Scene #2 



Fig. 10. Maps of the nonlinearity parameter b„ estimated by the UPPNMM for the real images. 

resulting from the post-nonlinear mixing model is the possibility of detecting nonlinearly from 
linearly mixed pixels. This detection can identify the image regions affected by nonlinearities 
in order to characterize the nonlinear effects more deeply. The number of endmembers 
contained in the hyperspectral image was assumed to be known in this work. We think 
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Fig. 11. Noise variances estimated by the UPPNMM (red) and the Hysime algorithm (blue) for the scene #1 (top) and 
the scene #2 (bottom). 



that estimating the number of components present in the image is an important issue that 
should be considered in future work. Finally, considering endmember variability in linear and 
nonlinear mixing models is an interesting prospect which is currently under investigation. 

Appendix: Derivation of the potential functions 



The potential energy ( [28| ) can be rewritten 



(41) 



where 



U2(Zn) 



1 



iT^^l 



[y„ - g^ (Ma„)]^ S'^ [y, - g^ (Ma^] , 



R~l 



E'osl-r-')- 



r=l 
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dZn 



Partial derivatives of U{zn) with respect to z„ is obtained using the classical chain rule 

dUjZn) _ dUi{an)dan <9^2(^«) 
dZn dan dzn 

Straightforward computations lead to 

dUi{an) 



Bar 



[y„ - g„ {yian)Y S-i [M + 26„ (Ma„ia M] 



dttr 



''r,n 



dzi 



^i,n J- 



if i > r 
if i = r 

if i < r 



dU2{Zr. 

OZi n 



R-i 



Similarly, the potential energy ( [30| ) can be rewritten 

V{me,) = V,{te) + V2{z„] 
with U = A^m^,: + diag (b) [(A^m^^;) (A^m^^;)] and 

f l|2 






ly^: 



2a 



|m£. -m^. 



2s2 



(42) 



(43) 



The partial derivatives of the potential energy (30) can be obtained using the chain rule 



dVirn,,) dViiti) dU ^dV2{jn,, 



and 



druf 



dte 
drag, 

gV2(m£,:) 

dm.1- 



dti dmi ■ 



dnii : 



aj 
A^ + 2diag(6)[(A^m,,lS)0A^ 
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